Approximating Sparse PCA from Incomplete Data
نویسندگان
چکیده
We study how well one can recover sparse principal components of a data matrix using a sketch formed from a few of its elements. We show that for a wide class of optimization problems, if the sketch is close (in the spectral norm) to the original data matrix, then one can recover a near optimal solution to the optimization problem by using the sketch. In particular, we use this approach to obtain sparse principal components and show that for m data points in n dimensions, O( −2k̃max{m,n}) elements gives an -additive approximation to the sparse PCA problem (k̃ is the stable rank of the data matrix). We demonstrate our algorithms extensively on image, text, biological and financial data. The results show that not only are we able to recover the sparse PCAs from the incomplete data, but by using our sparse sketch, the running time drops by a factor of five or more.
منابع مشابه
Recovering PCA and Sparse PCA via Hybrid-(l1, l2) Sparse Sampling of Data Elements
This paper addresses how well we can recover a data matrix when only given a few of its elements. We present a randomized algorithm that element-wise sparsifies the data, retaining only a few of its entries. Our new algorithm independently samples the data using probabilities that depend on both squares (`2 sampling) and absolute values (`1 sampling) of the entries. We prove that this hybrid al...
متن کاملSparse Kernel Principal Component Analysis
'Kernel' principal component analysis (PCA) is an elegant nonlinear generalisation of the popular linear data analysis method, where a kernel function implicitly defines a nonlinear transformation into a feature space wherein standard PCA is performed. Unfortunately, the technique is not 'sparse', since the components thus obtained are expressed in terms of kernels associated with every trainin...
متن کاملGreedy Bilateral Sketch, Completion & Smoothing
Recovering a large low-rank matrix from highly corrupted, incomplete or sparse outlier overwhelmed observations is the crux of various intriguing statistical problems. We explore the power of “greedy bilateral (GreB)” paradigm in reducing both time and sample complexities for solving these problems. GreB models a lowrank variable as a bilateral factorization, and updates the left and right fact...
متن کاملSparse Statistical Deformation Model for the Analysis of Craniofacial Malformations in the Crouzon Mouse
Crouzon syndrome is characterised by the premature fusion of cranial sutures. Recently the first genetic Crouzon mouse model was generated. In this study, Micro CT skull scannings of wild-type mice and Crouzon mice were investigated. Using nonrigid registration, a wild-type craniofacial mouse atlas was built. The atlas was registered to all mice providing parameters controlling the deformations...
متن کاملSparse Additive Matrix Factorization for Robust PCA and Its Generalization
Principal component analysis (PCA) can be regarded as approximating a data matrix with a low-rank one by imposing sparsity on its singular values, and its robust variant further captures sparse noise. In this paper, we extend such sparse matrix learning methods, and propose a novel unified framework called sparse additive matrix factorization (SAMF). SAMF systematically induces various types of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015